Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 289

qiime feature-table summarize \

--i-table dada2/table_feat_sample_freq_filtered_yoga_dada2.qza \

--m-sample-metadata-file data/sample-metadata.tsv \

--o-visualization dada2/table_feat_sample_freq_filtered_yoga_

dada2.qzv

qiime tools view dada2/table_feat_sample_freq_filtered_yoga_dada2.

qzv

You can use “qiime feature-table filter-samples --help” and “qiime feature-table filter-

features --help” to learn more about filtering parameters.

The steps of the preprocessing of raw data that we followed above will end up with

creation of feature tables and features (OTUs/ASVs) that we will rely on to move to the

downstream analysis (taxonomic classification, phylogenetic relationship, alpha and beta

diversity analysis, and differential abundance). There are always questions that come up at

this point: Which method is better (clustering or denoising)? Which clustering method is

the best (de novo, closed-reference, or open-reference clustering)? And, which denoising

method is better (DADA2 or deblur)? The right answer from many experts is that: try all of

them and adopt the one that works for you. There are number of articles that discussed the

pros and cons of each of these methods.

7.3.5 Taxonomic Assignment with QIIME2

As discussed above, one of the main goals of metagenomic studies is to identify the micro-

bial organisms present in a sample using any of the alignment-based classifiers or machine

learning classifiers. QIIME2 has alignment-based classifiers, machine learning classifiers,

and hybrid classifier methods contained in the “q2-feature-classifier” plugin. The alignment-

based classifier methods are “classify-consensus-blast” and “classify-consensus-vsearch”.

The machine learning classifier method is “classify-sklearn”, which is used for pre-fitted

sklearn-based taxonomy classifiers (any of the classifiers available in scikit-learn python

package). You can download a shared pre-fitted classifier; however, it is safer to train yours

and use it for taxonomy assignment. Some pre-fitted naïve bayes classifiers and weighted

taxonomic classifiers are available at the QIIME2 data resources web page at “https://docs.

qiime2.org/2022.2/data-resources/” or any newer release. To train your own taxonomy

classifier, you can use any of the two training methods provided by “q2-feature-classifer”

or “fit-classifier-naive-bayes” to train a naïve bayes classifier or “fit-classifier-sklearn” to

train any arbitrary scikit-learn classifier. An alpha hybrid classifier (VSEARCH + sklearn

classifier) is provided with “classify-hybrid-vsearch-sklearn” method.

7.3.5.1 Using Alignment-Based Classifiers

The alignment-based classifiers use each of representative sequences generated with

clustering or denoising as query sequence to search against the database representative

sequences whose taxa are known. The taxonomy assignment is based on the consensus of

the BLAST hits at percent identity greater than a predetermined identity threshold. The

taxonomy ranks, from the highest to the lowest, are kingdom, phylum, class, order, family,

genus, and species. The confidence on the assignment is the fraction of top hits that match